Single-instance Storage
   HOME

TheInfoList



OR:

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s,
e-mail server Within the Internet email system, a message transfer agent (MTA), or mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using SMTP. The terms mail server, mail exchanger, and MX host ...
software,
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
backup In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", w ...
, and other storage-related computer software. Single-instance storage is a simple variant of
data deduplication In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amou ...
. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages.


Concept

In the case of an
e-mail server Within the Internet email system, a message transfer agent (MTA), or mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using SMTP. The terms mail server, mail exchanger, and MX host ...
, single-instance storage would mean that a single copy of a message is held within its
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
while individual mailboxes access the content through a reference pointer. However, there is a common misconception that the primary benefit of single-instance storage in mail servers is a reduction in disk space requirements. The truth is that its primary benefit is to greatly enhance delivery efficiency of messages sent to large distribution lists. In a mail server scenario disk space savings from single-instance storage are transient and drop off very quickly over time. When used in conjunction with backup software, single-instance storage can reduce the quantity of
archive An archive is an accumulation of historical records or materials – in any medium – or the physical facility in which they are located. Archives contain primary source documents that have accumulated over the course of an individual or ...
media required since it avoids storing duplicate copies of the same file. Often identical files are installed on multiple computers, for example
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
files. With single-instance storage, only one copy of a file is written to the backup media therefore reducing space. This becomes more important when the storage is offsite and on
cloud storage Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is t ...
such as
Amazon S3 Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its ...
. In such cases, it has been reported that deduplication can help reduce the costs of storage, costs of bandwidth and backup windows by up to 10:1.
Novell GroupWise GroupWise is a messaging and collaboration platform from Micro Focus that supports email, calendaring, personal information management, instant messaging, and document management. The GroupWise platform consists of desktop client software, w ...
was built on single-instance storage, which accounts for its large capacity. ISO CD/DVD image files can be optimized to use SIS to reduce the size of a CD/DVD compilation (if there are enough duplicated files) to make it fit into smaller media. SIS is related to system wide file duplication search and multiple file instance detection tools such as the P2P application
BearShare BearShare was a peer-to-peer-file-sharing-application originally created by Free Peers, Inc. for Microsoft Windows and also a rebranded version of iMesh by MusicLab, LLC, tightly integrated with their music subscription service. History The pr ...
(5.n Versions and below) but differs in that SIS reduces storage utilization automatically and creates and retains symbolic linkages, whereas Bearshare allows for manual deletion of duplicates and associated user-level file system,
Windows Explorer File Explorer, previously known as Windows Explorer, is a file manager application that is included with releases of the Microsoft Windows operating system from Windows 95 onwards. It provides a graphical user interface for accessing the file ...
type of icon links.


Microsoft

SIS was introduced with the
Remote Installation Services RIS, Remote Installation Services is a Microsoft-supplied server that allows PXE BIOS-enabled computers to remotely execute boot environment variables. These variables are likely computers that are on a company's (or that company's client's) net ...
feature of
Windows 2000 Server Windows 2000 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses. It was the direct successor to Windows NT 4.0, and was released to manufacturing on December 15, 1999, and was officiall ...
. A typical server might hold ten or more unique installation configurations (perhaps with different
device driver In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and ot ...
s or
software suite A software suite (also known as an application suite) is a collection of computer programs (application software, or programming software) of related functionality, sharing a similar user interface and the ability to easily exchange data with each ...
s) but perhaps only 20% of the data may be unique between configurations. Microsoft states that "SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with
pointers Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a l ...
to the stored versions." Files are compared solely by their
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
s; files with different names or dates can be consolidated so long as the data itself is identical.
Windows Server 2003 Windows Server 2003 is the sixth version of Windows Server operating system produced by Microsoft. It is part of the Windows NT family of operating systems and was released to manufacturing on March 28, 2003 and generally available on April 24, 2 ...
Standard Edition has SIS capabilities but is limited to OEM OS system installs. The file-based
Windows Imaging Format The Windows Imaging Format (WIM) is a file-based disk image format. It was developed by Microsoft to help deploy Windows Vista and subsequent versions of the Windows operating system family, as well as Windows Fundamentals for Legacy PCs. Desig ...
introduced in
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
also supported single-instance storage. Single-instance storage was a feature of
Microsoft Exchange Server Microsoft Exchange Server is a mail server and calendaring server developed by Microsoft. It runs exclusively on Windows Server operating systems. The first version was called Exchange Server 4.0, to position it as the successor to the related ...
since version 4.0 and is also present in Microsoft's
Windows Home Server Windows Home Server (code-named Quattro) is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released on ...
. It is deduplicating attachments only in Exchange 2007 and was dropped completely in Microsoft Exchange Server 2010. Microsoft announced Windows Storage Server 2008 (WSS2008)Windows Storage Server 2008
at Microsoft
with Single Instance Storage on June 1, 2009, and states this feature is not available on
Windows Server 2008 Windows Server 2008 is the fourth release of the Windows Server operating system produced by Microsoft as part of the Windows NT family of the operating systems. It was released to manufacturing on February 4, 2008, and generally to retail on Fe ...
. The feature is officially deprecated since Windows Server 2012, when a new, more powerful chunk-based data deduplication mechanism was introduced. It allows files with similar content to be deduplicated as long as they have stretches of identical data. This mechanism is more powerful than SIS. Since Windows Server 2019, the feature is fully supported on ReFS.


See also

*
Capacity optimization Capacity optimization is a general term for technologies used to improve storage use by shrinking stored data. Primary technologies used for capacity optimization are data deduplication and data compression. These are delivered as software or hard ...
*
Data deduplication In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amou ...
*
Peer-to-peer file sharing Peer-to-peer file sharing is the distribution and sharing of digital media using peer-to-peer (P2P) networking technology. P2P file sharing allows users to access media files such as books, music, movies, and games using a P2P software program tha ...
*
WinFS WinFS (short for Windows Future Storage) was the code name for a canceled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Micro ...


References

{{Reflist, 30em Computer data storage Computer file systems Databases Storage software